Skip to content

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 8, 2025

Mirrored from ggml-org/llama.cpp#16490

Not sure if there is a reason not to enable graph reuse for recurrent graphs (mamba, hybrids, SSM, etc.). Did a few tests and seems to work, resulting in some modest perf improvements. cc @gabe-l-hart @compilade

Without graph reuse

make -j && LLAMA_GRAPH_REUSE_DISABLE=1 ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32
model size params backend ngl threads fa test t/s
mamba 0.1B F16 256.96 MiB 129.14 M Metal 99 1 1 pp512 8415.73 ± 46.47
mamba 0.1B F16 256.96 MiB 129.14 M Metal 99 1 1 tg32 322.74 ± 0.64
granitehybrid ?B Q8_0 6.88 GiB 6.94 B Metal 99 1 1 pp512 2119.36 ± 3.31
granitehybrid ?B Q8_0 6.88 GiB 6.94 B Metal 99 1 1 tg32 77.17 ± 0.11
jamba ?B Q8_0 51.05 GiB 51.57 B Metal 99 1 1 pp512 603.47 ± 1.83
jamba ?B Q8_0 51.05 GiB 51.57 B Metal 99 1 1 tg32 42.35 ± 0.02
lfm2 2.6B Q4_K - Medium 1.45 GiB 2.57 B Metal 99 1 1 pp512 2923.41 ± 3.20
lfm2 2.6B Q4_K - Medium 1.45 GiB 2.57 B Metal 99 1 1 tg32 169.83 ± 0.67
build: 638e2c2 (6725)

With graph reuse

make -j && ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32
model size params backend ngl threads fa test t/s
mamba 0.1B F16 256.96 MiB 129.14 M Metal 99 1 1 pp512 8453.65 ± 20.10
mamba 0.1B F16 256.96 MiB 129.14 M Metal 99 1 1 tg32 348.83 ± 1.67
granitehybrid ?B Q8_0 6.88 GiB 6.94 B Metal 99 1 1 pp512 2126.12 ± 1.90
granitehybrid ?B Q8_0 6.88 GiB 6.94 B Metal 99 1 1 tg32 82.26 ± 0.13
jamba ?B Q8_0 51.05 GiB 51.57 B Metal 99 1 1 pp512 604.56 ± 2.08
jamba ?B Q8_0 51.05 GiB 51.57 B Metal 99 1 1 tg32 43.22 ± 0.02
lfm2 2.6B Q4_K - Medium 1.45 GiB 2.57 B Metal 99 1 1 pp512 2928.31 ± 1.78
lfm2 2.6B Q4_K - Medium 1.45 GiB 2.57 B Metal 99 1 1 tg32 179.18 ± 0.47
build: 638e2c2 (6725)

@DajanaV DajanaV force-pushed the main branch 24 times, most recently from 98e1e20 to 2791104 Compare November 11, 2025 09:10
@DajanaV DajanaV force-pushed the main branch 19 times, most recently from 24733fb to 4b4bb7c Compare November 13, 2025 12:15
@DajanaV DajanaV closed this Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants